AITopics | answer length

Collaborating Authors

answer length

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Comparative Analysis of 47 Context-Based Question Answer Models Across 8 Diverse Datasets

Muneeb, Muhammad, Ascher, David B., Bakht, Ahsan Baidar

arXiv.org Artificial IntelligenceDec-2-2025

Context-based question answering (CBQA) models provide more accurate and relevant answers by considering the contextual information. They effectively extract specific information given a context, making them functional in various applications involving user support, information retrieval, and educational platforms. In this manuscript, we benchmarked the performance of 47 CBQA models from Hugging Face on eight different datasets. This study aims to identify the best-performing model across diverse datasets without additional fine-tuning. It is valuable for practical applications where the need to retrain models for specific datasets is minimized, streamlining the implementation of these models in various contexts. The best-performing models were trained on the SQuAD v2 or SQuAD v1 datasets. The best-performing model was ahotrod/electra_large_discriminator_squad2_512, which yielded 43\% accuracy across all datasets. We observed that the computation time of all models depends on the context length and the model size. The model's performance usually decreases with an increase in the answer length. Moreover, the model's performance depends on the context complexity. We also used the Genetic algorithm to improve the overall accuracy by integrating responses from other models. ahotrod/electra_large_discriminator_squad2_512 generated the best results for bioasq10b-factoid (65.92\%), biomedical\_cpgQA (96.45\%), QuAC (11.13\%), and Question Answer Dataset (41.6\%). Bert-large-uncased-whole-word-masking-finetuned-squad achieved an accuracy of 82\% on the IELTS dataset.

large language model, machine learning, question answering, (21 more...)

arXiv.org Artificial Intelligence

2512.00323

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Oceania > Australia > Queensland (0.04)
Europe > Austria (0.04)

Genre:

Research Report > Experimental Study (0.94)
Research Report > New Finding (0.68)

Industry:

Health & Medicine (1.00)
Energy (0.68)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Can LLMs Design Good Questions Based on Context?

Zhang, Yueheng, Liu, Xiaoyuan, Sun, Yiyou, Alharbi, Atheer, Alzahrani, Hend, Alomair, Basel, Song, Dawn

arXiv.org Artificial IntelligenceJan-6-2025

This paper evaluates questions generated by LLMs from context, comparing them to human-generated questions across six dimensions. We introduce an automated LLM-based evaluation method, focusing on aspects like question length, type, context coverage, and answerability. Our findings highlight unique characteristics of LLM-generated questions, contributing insights that can support further research in question quality and downstream applications.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.03491

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Minor SFT loss for LLM fine-tune to increase performance and reduce model deviation

Xie, Shiming, Chen, Hong, Yu, Fred, Sun, Zeye, Wu, Xiuyu

arXiv.org Artificial IntelligenceAug-20-2024

Instruct LLM provide a paradigm used in large scale language model to align LLM to human preference. The paradigm contains supervised fine tuning and reinforce learning from human feedback. This paradigm is also used in downstream scenarios to adapt LLM to specific corpora and applications. Comparing to SFT, there are many efforts focused on RLHF and several algorithms being proposed, such as PPO, DPO, IPO, KTO, MinorDPO and etc. Meanwhile most efforts for SFT are focused on how to collect, filter and mix high quality data. In this article with insight from DPO and MinorDPO, we propose a training metric for SFT to measure the discrepancy between the optimized model and the original model, and a loss function MinorSFT that can increase the training effectiveness, and reduce the discrepancy between the optimized LLM and original LLM.

deviation, sft, zhang, (17 more...)

arXiv.org Artificial Intelligence

2408.10642

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

MedLM: Exploring Language Models for Medical Question Answering Systems

Yagnik, Niraj, Jhaveri, Jay, Sharma, Vivek, Pila, Gabriel, Ben, Asma, Shang, Jingbo

arXiv.org Artificial IntelligenceJan-20-2024

In the face of rapidly expanding online medical literature, automated systems for aggregating and summarizing information are becoming increasingly crucial for healthcare professionals and patients. Large Language Models (LLMs), with their advanced generative capabilities, have shown promise in various NLP tasks, and their potential in the healthcare domain, particularly for Closed-Book Generative QnA, is significant. However, the performance of these models in domain-specific tasks such as medical Q&A remains largely unexplored. This study aims to fill this gap by comparing the performance of general and medical-specific distilled LMs for medical Q&A. We aim to evaluate the effectiveness of fine-tuning domain-specific LMs and compare the performance of different families of Language Models. The study will address critical questions about these models' reliability, comparative performance, and effectiveness in the context of medical Q&A. The findings will provide valuable insights into the suitability of different LMs for specific applications in the medical domain.

dataset, evaluation, language model, (14 more...)

arXiv.org Artificial Intelligence

2401.11389

Country: North America > United States > California > San Diego County > San Diego (0.05)

Genre: Research Report (1.00)

Industry: Health & Medicine > Health Care Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Optimizing Retrieval-augmented Reader Models via Token Elimination

Berchansky, Moshe, Izsak, Peter, Caciularu, Avi, Dagan, Ido, Wasserblat, Moshe

arXiv.org Artificial IntelligenceNov-5-2023

Fusion-in-Decoder (FiD) is an effective retrieval-augmented language model applied across a variety of open-domain tasks, such as question answering, fact checking, etc. In FiD, supporting passages are first retrieved and then processed using a generative model (Reader), which can cause a significant bottleneck in decoding time, particularly with long outputs. In this work, we analyze the contribution and necessity of all the retrieved passages to the performance of reader models, and propose eliminating some of the retrieved information, at the token level, that might not contribute essential information to the answer generation process. We demonstrate that our method can reduce run-time by up to 62.2%, with only a 2% reduction in performance, and in some cases, even improve the performance results.

answer length, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

2310.13682

Country:

Asia > Middle East > Israel (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Nephrology (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.55)

Add feedback

JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension

So, ByungHoon, Byun, Kyuhong, Kang, Kyungwon, Cho, Seongjin

arXiv.org Artificial IntelligenceFeb-3-2022

Question Answering (QA) is a task in which a machine understands a given document and a question to find an answer. Despite impressive progress in the NLP area, QA is still a challenging problem, especially for non-English languages due to the lack of annotated datasets. In this paper, we present the Japanese Question Answering Dataset, JaQuAD, which is annotated by humans. JaQuAD consists of 39,696 extractive question-answer pairs on Japanese Wikipedia articles. We finetuned a baseline model which achieves 78.92% for F1 score and 63.38% for EM on test set. The dataset and our experiments are available at https://github.com/SkelterLabsInc/JaQuAD.

dataset, jaquad, question type, (12 more...)

arXiv.org Artificial Intelligence

2202.01764

Country:

Asia > Japan > Honshū > Tōhoku (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
Europe > Germany (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area (0.95)
Education > Assessment & Standards > Student Performance (0.42)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.93)

Add feedback